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Abstract 

Program traces are used for analysis of program performance, memory utilization, and 
communications as well as for program debugging. The trace contains records of execu- 
tion events generated by monitoring units inserted into the program. The trace size limits 
the resolution of execution events and restricts the user's ability to analyze the program 
execution. We present a study of the information content of program traces and develop a 
coding scheme which reduces the trace size to the limit given by the trace entropy. We 
apply the coding to the traces of AIMS instrumented programs executed on the IBM SP2 
and the SGI Power Challenge and compare it with other coding methods. Our technique 
shows size of the trace can be reduced by more than a factor of 5. 
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1. Introduction 


A program trace contains records of the events that happened during the 
program execution. Each record contains the event identifier, location 
and the time when it happened. Depending on the event type, the record 
may contain additional information such as a message tag and a message 
destination. The trace records are generated by the monitoring units 
inserted into the program. This insertion can be done with an instrumen- 
tation tool into the source code [4], into assembly code or even into a 
loaded and running program [11]. An analysis of trace records gives data 
on the program performance, memory utilization and helps in the pro- 
gram debugging [13]. There are several tools for program instrumenta- 
tion, monitoring and trace visualization: prof , gprof , pixie [3], 
AIMS [4], Paradvn [11]. 

The larger the number of processors and the finer the event resolution, 
the larger the program trace. The trace size (common trace size is a dozen 
megabytes) limits the user's ability to monitor execution events and to 
localize the statements causing problems with the program. A number of 
studies were done to reduce the trace size [6]. In this paper we use an 
information-theoretic technique to reduce average trace size to the theo- 
retical low bound given by the trace entropy. Theoretically, the technique 
is based on the Noiseless Coding Theorem [1] asserting that an average code 
length of a random variable cannot be less than the entropy of this vari- 
able. The appropriate code length can be achieved by using the Huffman 
coding or by using the dynamic Huffman coding [9,10] if the distribution 
is unknown a priory. In practice, this technique is based on the collection 
of an event histogram and on the application of the Huffman coding. For 
the set of records considered in this paper, the trace entropy is the sum of 
the entropy of the program Markov chain, the communication entropy 
and the entropy of time stamps. We give a method for calculation of the 
three components of trace entropy and compare the entropy with results 
of applying the standard compression technique (compress and gzip). 
For the four traces we considered our compression is better than gzip 
and reduces the size of trace by a factor of 5. 


2. The Program Graph 

Depending on the abstraction level, a sequential program can be repre- 
sented by its call graph [3] or flow graph [2]. In this paper we use the flow 
graph representation. The vertices of the flow graph are basic blocks of 
the program code, and arcs are the possible ways for the program 
counter (pc) to move between basic blocks. If the pc is pointing to a basic 
block i, we will say that program is in the state i. A transition of the pro- 


2 



gram from one state to another will be referred as an event by saying that 
event i->j occurred if the pc moves from state i to state j. 

An execution of a sequential program can be represented by a path in the 
flow graph. The path starts at the first statement of the main function 
(start) and terminates at the exit. If a program is in a state i and event i->j 
occurrs we add the arc (i,j) to the path. The length of the execution is the 
number of arcs in the path, see Figure 1. If only a subset of states is moni- 
tored and only transitions between this reduced set of states are recorded, 
the flow graph can be reduced to a smaller graph by discarding nonmon- 
itored states and by adding arcs reflecting possible transitions between 
the reduced set of states. This reduced flow graph will be referred as the 
program graph. We will refer to the nodes of the graph as states. 



Path = start/l^lAl^exit 

FIGURE 1. A program graph and an execution path 

We will consider traces of message passing parallel programs. An execu- 
tion of each process of a parallel program can be represented by a path in 
the program graph as described above. In a message passing program 
pairs of processes interact by exchanging messages. This interdepen- 
dency can be specified by a sequence of pairwise matching 
send /receives. 

The coding of traces of sequential programs is considered in sections 3 
and 4. We consider coding of messages of message passing programs in 
section 6. Coding of time stamps in parallel programs is considered in 
section 7. Results of calculations of the minimal length of AIMS traces are 
tabulated in section 8. 
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3. Number of Possible Traces 

A simple lower bound for the length of a lossless code of a trace can be 
obtained by comparing the number of traces with the number of codes. 
There are at most 0(2 L ) codes of length L in the {0,1} alphabet, hence the 
maximum code length can not be less than the logarithm of the number 
of traces with N records 1 . 

The number of traces with N records is the number of directed paths in 
the program graph G from start to exit of length N. This number can be 
estimated through eigenvalues of the adjacency matrix of G: 

A = [a-], 1 <i,j<n 

where n is the number of states, a t j -1 if there is an arc from the state i to 
the state j and a l} = 0 otherwise. The number of paths from i to j of length 
N equals the i/ th element of A N , see [7]. Let 

A = U~ l AU 


be a spectral decomposition of A, where A is a diagonal matrix with 
Xj , ... , X~, |Xi| > |A, 2 | > . . . > |A, n | elements on its main diagonal 2 3 . Then 
Ar= IT'A n LZ and for the number of paths of length N from i to j we have 
a formula: 


N «N 
a ij = 


+ - +X n v in u nj 


where v % and are elements of matrices IT 1 and U respectively. Hence, 

log a N ij = O (N log \X 7 \) and log a N ij = Q(N log |A.j|) if we assume that v n and 
Uy are nonzero . 

The maximal module X of eigenvalues of the adjacency matrix can be 
estimated by means of maximal and minimal in- and out- degrees of the 
states: 


,in 


Maxi Mini d. >, Min 



in 


<X< Mini Maxi d . >, Maxi d ■ 


,out 


1. All logs are on base 2 in this paper 

2. If A has nontrivial fordan blocks our arguments can be modified appropriately without affect- 
ing the result. 

3. fin) = Q(g(n)) if there are constants c and C such that cg(n) < f(n)<Cg(n). Q(g(n)) is a class of func- 
tions and we use notation fin) = Q(g(n)) to indicate that /is in this class 
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In a typical program graph, degrees of majority of states are bounded by 
a constant; and only few states have larger degrees. This property can 
exploited for tighter bounding of the maximum module of the eigenval- 
ues. If, for example, one state has (large) degree D, and degrees of all 
other states are bounded by b, then using Gershgorin Circle Theorem, 
[12] we can get a sharper upper bound for the largest eigenvalue: 



+ 




2 


We can conclude that the minimum code length of a trace in the worst 
case is the logarithm of the number of possible N event traces and can be 
estimated as 


L(N) = 0(M ogl) 


where X is the maximal modules of eigenvalues of the adjacency matrix 
of the program graph. 


4. The Entropy of Traces 

The transition of the pc from one state to an another can be considered as 
a stochastic event. It means that each trace T has a probability p(T) to 
appear as a trace of a program. We want to minimize the expected length 
of trace code: 

^^c(T)p(T) — » Min 

T 


where c(T) is the code length of T. 

The minimum can be obtained from the Noiseless Coding Theorem 
[1, p. 37] which states that for any lossless coding of a random variable 
with distribution {pj the minimum average code length can not be less 
than the entropy of this distribution, that is, H = -Z », log This bound 
can be closely approached with a code having the 2 th code word length of 
-[log/2,1+1 see [1, p. 39]. 


It follows that the minimum of expected length of trace code is close to 
the entropy of the set of traces {T}. We will express the probability p(T) 
through the transition probabilities of the program. For the probability 
Pj n+1 of a path of length n+1 to terminate in a state/' we have a relation: 


n + 1 



2 
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where p,y is the transition probability from state i to state j. The relation 
holds in the case when pij is independent of the path by which the state i 
was reached as well as on the value of n. This property is known as 
Markov property [1]. For now we will assume that the program has this 
property. At the end of the section we will discuss a modification of state 
definitions and enhancements to the program model which can be done 
in the case where the state transitions lack the Markov property. 

In the matrix form, the relation above can be written as follows: (p n+1 /= 
(p n ) t Q, where p n is the probability column vector, Q = [p,j] is the matrix of 
transition probabilities and the superscript t means the transposition. If 
we consider a Markov chain [1] with matrix Q as a transition matrix then 
p n converges to a steady state probability vector of the Markov chain. The 
steady state vector 10 is defined as the left eigenvector of Q with eigen- 
value 1: 

w = wQ 


If Q is an irreducible matrix the Perron-Frobenius theorem [8, p. 399] 
asserts that iv exists and is unique. 

In addition to the stationary distribution, we need the entropy of each 
state i: 

HU) = -XPylOgPy 

j 

If x is a sequence of states then, T x will denote a trace with suffix x. The 
superscript of trace T" denotes the number of events in the trace. 

Now we can write a recurrence relation for entropy of traces of length 
n+1 through the entropy of traces of length n: 

H n+i = -^p(T n ")iog P (T n+ ') = w^r 1 ) = 

y,/1 + I j 'T' n + 1 

1 1 J 

‘III p(T”j +] )\ogp(Tl + '• = -III^: )P,j(logP(T") + \ogp,j) = 
j i T"*' J ‘ T" 

1 IJ 1 

-III rf* )PijOogp(T") + \ogp ij ) = 

i T* J 

-XX( p(7 '' ),ogp(7 ’' ) X^ +p(r ' ) X^ ,og ^) = 

i f n j j 

-X p(T n )\ogp(T n ) - x( X^ r ") W 1 ') = H n + I^ ( 0 

T i V J n ) i 
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Here we use the above mentioned Markov property of the chain: 

P (r” + 1 ) = P (T") Pij 

and the property of the steady state vector of Markov chain: for large n 

rriti 

* i 

meaning that the probability of state i is equal to the probability of arriv- 
ing to i in n steps. 

From this relation it follows that for large n we can express the entropy of 
traces trough a steady state vector of a Markov chain and the entropy of 
its states: 

H n = n^w t H(i) 

i 

which gives us a lower bound for the expected length of the trace code. 

If the Markov property is not true, then we can consider a more general 
program model. It will lead us to a similar formula for the trace entropy; 
however, to compute the entropy on right hand side is more difficult. The 
states of this more general model will be sequences of / program states u 
= and events will be transitions iji2--.i1-> iz—hk+v If the value of / is 

such that the program does not remember how it got in state u then the 
Markov property will be true for the model and arguments similar to one 
above can be applied. 


5. The Ergodicity of a Program 

How accurately does a program trace represent other possible executions 
of the program? If a set of program inputs can be classified into different 
categories and for different categories the program behavior varies, then 
a trace on one input tells you a little about traces on inputs from a differ- 
ent category. In order to get correct transition probabilities, a mixture of 
traces of runs on different category inputs is necessary. 

A program is called ergodic if the transition probabilities are independent 
of program execution. Coding of traces of an ergodic program can be sig- 
nificantly simplified. The transition probabilities can be estimated from 
the trace of preliminary program execution. These probabilities can be 
used for generating the Huffman code of the trace records of other execu- 
tions of the program. 
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6. The Communication Entropy of Parallel Message Passing 
Programs 

We confine our considerations to single-threaded processes communicat- 
ing by passing messages. Each process can be described by a program 
graph as explained in section 2. Transitions between the states in different 
processes are interdependent through exchanging messages. This causal- 
ity relation between events in different processes can be uniquely recon- 
structed if the program trace contains a set of pairwise matchable 
sends/ receives. 

We will assume that the message passing program is MPI compliant, 
meaning that the messages can be matched using MPI progress and order 
rules, [4, pp.30-31]. 

• Progress: "If a pair of matching send and receives have been ini- 
tialized on two processes, then at least one will complete..." 

• Order: If two messages with the same tag are sent from the same 
source to the same destination, then they are received in the 
same order they were sent (the symmetrical property is true for 
the receiver). 

The first rule implies that if there are sends matching a receive, then the 
receive must be matched with one of the sends (symmetrical for sends). 
From the second rule it follows that if there are several sends matching a 
receive, then the receive will be matched with the first posted send (sym- 
metrical for sends). The second rule also implies that the message passing 
program is deterministic if wild card MPI_ANY_SOURCE and calls 
MPI_CANCEL and MPI_WAITANY are not used and no MPI error 
occurred. 

These rules together give rise to a unique way to match messages using 
the process id, message tag and message order. Let S t j and Rj t be lists of 
sends and receives with source i and destination;. The matching can be 
done by the iteration of the following step: 

• Take the first receive r in Rj t and find first matching send s in S,j 
(since the messages already have the same source and the same 
destinat ion it means we take the first send with the same tag). 
Remove r and s from the lists. 

Tagging messages allows changing the order in which the messages are 
received. Two messages received in an order different from which they 
were sent are called intertwined, cf. [4, p 31]. An appropriate tagging 
allows receiving messages in an arbitrary order. In practice, intertwining 
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is limited by the size of system buffers. An example of an intertwined 
sequence of messages is shown in Figure 2. For the trace records of mes- 
sages it means that the record should contain the message destina- 
tion/source and the message tag. Otherwise the causality relation of 
events in different processes will not be uniquely reconstructible from the 
trace records. 


aababbcc 

S 


R 

FIGURE 2. Intertwining of two sequences by using tags a,b and c 

We assume that the message passing program is deterministic. The pro- 
gram sends /receives can be matched uniquely if for each pair of pro- 
cesses i and j, a chronological list S;y of tags of messages sent by i to j and 
a chronological list R;. of tags of messages received by i from are speci- 
fied. These lists can be composed into a send matrix S = [S,y] and a receive 
matrix R = [R; ; ] of the program. These matrices specify uniquely the cau- 
sality relation between events of the program. 

A pair of send/receive matrices is consistent if the length s,y of the list S„- is 
equal to the length of the list Rj u and the number L* of tags with value 
k is the same in both lists. In other words the matrix I r,y 1 is a transposi- 
tion of the matrix I s,y I and the list Ry, is a permutation of the list S,y. Any 
consistent pair of matrices can arise as a send/receive matrix pair of the 
program (it is sufficient to set message tags according to the list ele- 
ments). 

Now we fix Sj;= r y,- and t,j k and count the number of possible consistent S 
an R matrices. As in section 3 the logarithm of this number will give us 
the minimum code length of the communication matrices of the pro- 
gram. 

Let Sj be the total number of sends in process i, s t = s n + s t2 +... + s ip , where 
p is the number of processes. For the given Sjp Sj 2 S/p the total number 
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of possible send sequences in the process i (which is the number of possi- 
ble instances of ; row of the matrix S ) is 



where 





is the number of possible lists of length s,y having t t j k tags with value k. 
The send sequences in different processes are independent so the total 
number of send lists equals IIS,. A similar formula for the 7 th row of 
matrix R is true: 



where rj = Sjj + Syj +... + s p( -.Let M =sj + Si +••• + s p =rj + r2 +■■■ + r p be the 
total number of messages in the program. 

The total number of send/ receive matrices equals nS ( TlKy. The logarithm 
of this number can be well approximated with the Stirling formula as: 

M(H S + H r + 2 HJ 


where 


P P 




P P 

S: _ S;; S;; 


l J 
P P 


H r = 


P P 

riJLS:: S :: 




1 ] 
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are the entropy of the program sends and receives respectively and 


,• j k lJ i '■ fr lJ lJ 


is the entropy of message tags in the program, where K is the number of 
tags. 


We can conclude this section saying that the minimal code length of 
communication events of a message passing program equals to the num- 
ber program messages times the sum of the communication entropy of 
the program and of the entropy of the message tags. The last two num- 
bers can be easily computed if the trace records contain the message 
source, destination and message tags, cf. Table 3. 


7. Time Stamps in Program Trace 

Assume that the sequence of trace events is known and we want to spec- 
ify the time when the events occurred. The direct recording of the time 
stamps expressed in counts of clock periods or time resolution of the 
clock counter would give rise to the code of length 

L = Y j N l \o%T i 

i 

where T, is the execution time of I th process and N,- is the number of 
events in the trace of i th process. Should we record the elapsed time 
between events we would have a similar formula for the code length of 
the time stamps with T, replaced by the maximum time between events 
in i th process which can be substantially smaller than T,. The use of an 
entropy efficient coding would reduce the code length to 

L = ^Nfl, 

i 

where H t is the entropy of the distribution of time intervals between 
events. 

The time stamps are used for evaluation of the program performance, 
ordering of the program events and for displaying of program traces. 
Neither of these applications requires an exact time and can tolerate 
approximate values of time stamps as long as the approximation does 
not change event ordering and event resolution. For the approximation. 
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it is essential not to move events relative to each other significantly. This 
requirement can be formulated in terms of event chronology defined as a 
chronologically ordered sequence of times of events in all processes of 
the program. Let X) y be the time interval between neighbor events k and / 
and tj be the time interval between neighbor events to / in event chronol- 
ogy, see Figure 3 


k l 

a kl 


Pn 1 



— ►] ill | 

1 o 1 

Pi ■ i 

1 1 

1 1 

s kl , 

III II 1 I_1 1 1 

1 *- 

1 1— 

_l J L 

1 1 1 

1 

1 1 1 1 1 1 1 1 III 1 1 


Event chronology 


FIGURE 3. Combining events on different processes into event 

chronology 

We will require t hat the approximation % of the time stamp x ki of any 
event is within 8 times the length of the interval between neighbor 
events: 

x kl ~ a kl\ ~ £X l 


Let pi = ?{s kl = i } and r = ex. If we use the approximation of time stamps 
with the precision e, then instead of the random variable x with the distri- 
bution { pj } we have to code the random variable a with the distribution 

{*7 m } 

q m = X p ( = '^P(mexi <x< ( m +l)ex l ) 

mr < i < (m + l)r 
For the entropy of q we have: 

H(q) = -X^Jog^ = -2 Pilo&Pr X = 

m i mr <i <(m+ l)r 


H(p) + ^q m X 


mr < / < (m + 1 )r 


Pi , Pt 
log = 

m Qm 


H(p)~^q m H(p\q m ) = H{p) - H(p\q) < H(p) 

m 


where H(p I q) is the conditional entropy of the distribution (pi) relative to 
the distribution { q m } [1]. 
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8. Application to AIMS Traces 


We calculated the entropy of events, entropy of time stamps and commu- 
nication entropy of AIMS traces of 4 fluid dynamic programs. The result 
are shown in Tables 2-3. The basic data about the programs are listed in 
Table 1. 

Small entropy relative to the trace size indicates that a significant number 
of patterns exists in program traces. For example, a small value of com- 
munication entropy in the column 3 of Table 3 indicates that processors 
send messages in a very regular pattern. The Table 4 compares a theore- 
tial lower bound of a trace code length with length of the code generated 
by compress and gzip utilities. 


TABLE 1. Basic data on the programs 


Program name 

ti 

£2 

t3 

t4 

Number of Processors 

10 

27 

11 

64 

Communication Library 

MPI 

PVM 

PVM 

MPI 

Total Execution Time (s) 

313.3 

1987.3 

1776.0 

11.4 


TABLE 2. Time stamps entropy as function of time precision 



ti 

t2 

t3 

t4 

10% 

457 

2588 

1074 

1273 

5% 

653 

3985 

1323 

3831 

2% 

958 

6510 

1645 

9062 

1% 

1234 

8748 

1899 

15578 

0.5% 

1517 

11335 

2126 

20730 


TABLE 3. Minimal code lengths 



ti 

t2 

t3 

t4 

Send code length 

301 

290 

46 

23052 

Receive code length 

196 

290 

99 

23052 

Event code length 

831 

8748 

3914 

58295 

Markov chain state number 

4519 

4234 

4261 

3485 
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TABLE 4. Comparison of Sizes of Different Trace Encodings 



ti 

t2 

t3 

t4 

Entropy minimal code length (0.5% 
time approximation) 

30897 

422879 

61453 

2867455 

Actual trace size 

278544 

1999818 

378893 

15269659 

Number of records 

7013 

96104 

13817 

685581 

UNIX compressed 

82341 

639297 

124001 

3505437 

gziped 

60804 

482948 

93358 

3037194 


9. Conclusions 

In this paper we have studied the information content of traces of mes- 
sage passing programs. The information content is measured as the sum 
of the entropy of the trace events, the entropy of program communica- 
tions and the entropy of time stamps. 

The information content of program traces is significantly smaller than 
the trace size. There are several reasons for this: (1) event entropy is 
related to the entropy of the Markov chain of the program, (2) the mes- 
sages of common programs have well defined patterns, and (3) the time 
stamp resolution can be decreased significantly without affecting the 
event causality. Indeed, we found by direct calculation of the entropy of 
traces of 4 programs that the entropy is smaller by a factor of 5 than the 
trace size. This also indicates that a significant number of patterns exists 
in thouse program traces. 

A practical conclusion of this study is that an entropy efficient encoding 
of program traces will result in significant reduction of trace sizes. 
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